How To Make The World Add Up by Tim Harford

How To Make The World Add Up by Tim Harford

Author:Tim Harford
Language: eng
Format: mobi
ISBN: 9781408712221
Publisher: Little, Brown
Published: 0101-01-01T00:00:00+00:00


So far we’ve focused on excessive credulity in the power of the algorithm to extract wisdom from the data it is fed. There’s another, related problem: excessive credulity in the quality or completeness of the dataset.

We explored the completeness problem in the previous chapter. Literary Digest accumulated what might fairly be described as big data. It was certainly an enormous survey by the standards of the day – indeed, even by today’s standards a dataset with 2.4 million people in it is impressive. But you can’t use Literary Digest surveys to predict election results if ‘people who respond to Literary Digest surveys’ differ in some consistent way from ‘people who vote in elections’.

Google Flu Trends captured every Google search, but not everybody who gets flu turns to Google. Its accuracy depended on ‘people with flu who consult Google about it’ not being systematically different from ‘people with flu’. The pothole-detecting app we met in the last chapter fell short because it confused ‘people who hear about and install pothole-detecting apps’ with ‘people who drive around the city’.

How about quality? Here’s an instructive example of big data from an even older vintage than the 1936 US election poll: the astonishing attempt to assess the typical temperature of the human body. Over the course of eighteen years, the nineteenth-century German doctor Carl Wunderlich assembled over a million measurements of body temperature, gathered from more than 25,000 patients. A million measurements! It’s a truly staggering achievement given the pen-and-paper technology of the day. Wunderlich is the man behind the conventional wisdom that normal body temperature is 98.6ºF. Nobody wanted to gainsay his findings, partly because the dataset was large enough to command respect, and partly because the prospect of challenging it with a bigger, better dataset was intimidating. Dr Philip Mackowiak, an expert on Wunderlich, put it, ‘Nobody was in a position or had the desire to amass a dataset that large.’12

Yet Wunderlich’s numbers were off; we’re normally a little cooler (by about half a Fahrenheit degree).13 So formidable were his data that it took more than a hundred years to establish that the good doctor had been in error.*

So how could so large a dataset be wrong? When Dr Mackowiak discovered one of Carl Wunderlich’s old thermometers in a medical museum, he was able to inspect it. He found that it was miscalibrated by two degrees centigrade, almost four degrees Fahrenheit. This error was partly offset by Dr Wunderlich’s habit of taking the temperature of the armpit rather than carefully inserting the thermometer into one of the bodily orifices conventionally used in modern times. You can take a million temperature readings, but if your thermometer is broken and you’re poking around in armpits, then your results will be a precise estimate of the wrong answer. The old cliché of ‘garbage in, garbage out’ remains true no matter how many scraps of garbage you collect.

As we saw in the last chapter, the modern version of this old problem is an algorithm that has been trained on a systematically biased dataset.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.